A Rule-based Syllable Segmentation of Myanmar Text

نویسندگان

  • Zin Maung Maung
  • Yoshiki Mikami
چکیده

Myanmar script uses no space between words and syllable segmentation represents a significant process in many NLP tasks such as word segmentation, sorting, line breaking and so on. In this study, a rulebased approach of syllable segmentation algorithm for Myanmar text is proposed. Segmentation rules were created based on the syllable structure of Myanmar script and a syllable segmentation algorithm was designed based on the created rules. A segmentation program was developed to evaluate the algorithm. A training corpus containing 32,283 Myanmar syllables was tested in the program and the experimental results show an accuracy rate of 99.96% for segmentation.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Identification of Adopted Pali Words in Myanmar Text

Myanmar language has been significantly influenced by Pali language due to the practice of Buddhism and study of Buddhist literature in Myanmar. As a result, Pali words have been widely adopted and used in Myanmar language. This study presents an algorithm for identifying Myanmar-adopted Pali words in Myanmar text. The system employs a combination of rule-based syllable segmentation and a dicti...

متن کامل

Unsupervised and Semi-supervised Myanmar Word Segmentation Approaches for Statistical Machine Translation

In statistical machine translation (SMT), word segmentation is generally a necessary step for languages that do not naturally delimit words. For many low-resource languages there are no word segmentation tools, and research on word segmentation for these languages is often quite scarce. In this paper, we study several plausible methods for Myanmar word segmentation for machine translation in or...

متن کامل

A Romanian Syllable-Based Text-To-Speech System

In this article we present the way we have built a syllable-based TTS system for Romanian. The system contains: a text analyser capable to separate syllables from input text and detect accentuation, a vocal database with recorded syllables, a unit matching module and a synthesizer. The analyser was built using a LEX generator by mean of two sets of phonetic rules. Vocal database was generated t...

متن کامل

Myanmar Word Segmentation using Syllable level Longest Matching

In Myanmar language, sentences are clearly delimited by a unique sentence boundary marker but are written without necessarily pausing between words with spaces. It is therefore non-trivial to segment sentences into words. Word tokenizing plays a vital role in most Natural Language Processing applications. We observe that word boundaries generally align with syllable boundaries. Working directly...

متن کامل

A Syllable Based Continuous Sp

This paper presents a novel technique for building a syllable based continuous speech recognizer when unannotated transcribed train data is available. We present two different segmentation algorithms to segment the speech and the corresponding text into comparable syllable like units. A group delay based two level segmentation algorithm is proposed to extract accurate syllable units from the sp...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008